Process Substitution
   HOME

TheInfoList



OR:

In computing, process substitution is a form of
inter-process communication In computer science, interprocess communication (IPC) is the sharing of data between running Process (computing), processes in a computer system. Mechanisms for IPC may be provided by an operating system. Applications which use IPC are often cat ...
that allows the input or output of a command to appear as a file. The command is substituted in-line, where a file name would normally occur, by the
command shell An operating system shell is a computer program that provides relatively broad and direct access to the system on which it runs. The term ''shell'' refers to how it is a relatively thin layer around an operating system. A shell is generally a ...
. This allows programs that normally only accept files to directly read from or write to another program.


History

Process substitution was available as a compile-time option for ksh88, the 1988 version of the
KornShell KornShell (ksh) is a Unix shell which was developed by David Korn (computer scientist), David Korn at Bell Labs in the early 1980s and announced at USENIX Annual Technical Conference, USENIX on July 14, 1983. The initial development was base ...
from
Bell Labs Nokia Bell Labs, commonly referred to as ''Bell Labs'', is an American industrial research and development company owned by Finnish technology company Nokia. With headquarters located in Murray Hill, New Jersey, Murray Hill, New Jersey, the compa ...
. The rc shell provides the feature as "pipeline branching" in
Version 10 Unix Research Unix refers to the early versions of the Unix operating system for DEC PDP-7, PDP-11, VAX and Interdata 7/32 and 8/32 computers, developed in the Bell Labs Computing Sciences Research Center (CSRC). The term ''Research Unix'' first appea ...
, released in 1990. The Bash shell provided process substitution no later than version 1.14, released in 1994. Available in the
Gnu source archive of version 1.14.7
as of 12 February 2016.


Example

The following examples use KornShell syntax. The
Unix Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
diff In computing, the utility diff is a data comparison tool that computes and displays the differences between the contents of files. Unlike edit distance notions used for other purposes, diff is line-oriented rather than character-oriented, but i ...
command normally accepts the names of two files to compare, or one file name and standard input. Process substitution allows one to compare the output of two programs directly: $ diff <(sort file1) <(sort file2) The <(command) expression tells the command interpreter to run ''command'' and make its output appear as a file. The ''command'' can be any arbitrarily complex shell command. Without process substitution, the alternatives are: Both alternatives are more cumbersome. Process substitution can also be used to capture output that would normally go to a file, and redirect it to the input of a process. The Bash syntax for writing to a process is >(command). Here is an example using the
tee A tee is a stand used in sport to support and elevate a stationary ball prior to striking with a foot, club, or bat. Tees are used extensively in golf, tee-ball, baseball, American football, and rugby. Etymology The word tee is derived from t ...
, wc and
gzip gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and ...
commands that counts the lines in a file with wc -l and compresses it with gzip in one pass: $ tee >(wc -l >&2) < bigfile , gzip > bigfile.gz


Advantages

The main advantages of process substitution over its alternatives are: * Simplicity: The commands can be given in-line; there is no need to save temporary files or create named pipes first. * Performance: Reading directly from another process is often faster than having to write a temporary file to disk, then read it back in. This also saves disk space. * Parallelism: The substituted process can be running concurrently with the command reading its output or writing its input, taking advantage of
multiprocessing Multiprocessing (MP) is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them. The ...
to reduce the total time for the computation.


Mechanism

Under the hood, process substitution has two implementations. On systems which support /dev/fd (most Unix-like systems) it works by calling the pipe() system call, which returns a file descriptor $fd for a new anonymous pipe, then creating the string /dev/fd/$fd, and substitutes that on the command line. On systems without /dev/fd support, it calls mkfifo with a new temporary filename to create a named pipe, and substitutes this filename on the command line. To illustrate the steps involved, consider the following simple command substitution on a system with /dev/fd support: $ diff file1 <(sort file2) The steps the shell performs are: # Create a new anonymous pipe. This pipe will be accessible with something like /dev/fd/63; you can see it with a command like echo <(true). # Execute the substituted command in the background (sort file2 in this case), piping its output to the anonymous pipe. # Execute the primary command, replacing the substituted command with the path of the anonymous pipe. In this case, the full command might expand to something like diff file1 /dev/fd/63. # When execution is finished, close the anonymous pipe. For named pipes, the execution differs solely in the creation and deletion of the pipe; they are created with mkfifo (which is given a new temporary file name) and removed with unlink. All other aspects remain the same.


Limitations

The "files" created are not seekable, which means the process reading or writing to the file cannot perform
random access Random access (also called direct access) is the ability to access an arbitrary element of a sequence in equal time or any datum from a population of addressable elements roughly as easily and efficiently as any other, no matter how many elemen ...
; it must read or write once from start to finish. Programs that explicitly check the type of a file before opening it may refuse to work with process substitution, because the "file" resulting from process substitution is not a
regular file The Unix file types are the categories of file formats that a Unix-based system uses to provide context-sensitive behavior of file system items all of which called ''files'' in Unix-based systems. POSIX defines categories: regular, directory, sy ...
. Additionally, up to Bash 4.4 (released September 2016), it was not possible to obtain the exit code of a process substitution command from the shell that created the process substitution.


See also

*
Pipeline (Unix) In Unix-like computer operating systems, a pipeline is a mechanism for inter-process communication using message passing. A pipeline is a set of process (computing), processes chained together by their standard streams, so that the output text of ...
* Named pipe *
Command substitution In computing, command substitution is a facility that allows a Command-line interpreter, command to be run and its output to be pasted back on the command line as arguments to another command. Command substitution first appeared in the Bourne she ...
*
Comparison of command shells This article catalogs comparable aspects of notable operating system shell (computing), shells. General characteristics {, class="wikitable sortable sticky-header sort-under" style="width: auto; text-align: center; font-size: smaller;" , - ...
*
Anonymous pipe In computer science, an anonymous pipe is a simplex FIFO communication channel that may be used for one-way interprocess communication (IPC). An implementation is often integrated into the operating system's file IO subsystem. Typically a parent ...


References


Further reading

* * * {{cite web , url=http://www.linuxjournal.com/content/shell-process-redirection , title=Bash Process Substitution , first=Mitch , last=Frazier , work=Linux Journal , date=22 May 2008 , accessdate=1 Oct 2011 Programming language topics Unix programming tools